perf(crypto): specialize keccak256 for the fixed-shape Merkle parent hash#774
Open
Oppen wants to merge 2 commits into
Open
perf(crypto): specialize keccak256 for the fixed-shape Merkle parent hash#774Oppen wants to merge 2 commits into
Oppen wants to merge 2 commits into
Conversation
hash_new_parent always hashes exactly 64 bytes (two 32-byte nodes), which fits the keccak rate in one block. Skip sha3's block_buffer/Digest machinery and run keccak-f[1600] directly for the Keccak256/32-byte case; fall back to the generic Digest path for every other backend instantiation. Cuts ~23M cycles (~1%) off the recursion verifier guest's multi-query profile.
Address adversarial-review findings on the previous commit: - hash_new_parent's fast path now builds the keccak state directly from left/right (no intermediate 136-byte buffer, no owned-array copies) and is #[inline], so it collapses into callers instead of costing a real cross-crate call plus redundant copies. - FieldElementPairBackend::hash_data (always 2 small elements, always single-block) gets the same single-permutation treatment. - FieldElementVectorBackend::hash_data deliberately keeps the plain multi-block Digest path: its inputs are whole trace rows, which always exceed the keccak rate, so a "try one block, else fall back" attempt measured as a net regression (all cost, no payoff) rather than a win. - Added tests pinning the TypeId dispatch itself (not just the inner permutation helpers) against an independent sha3 reference. Cuts guest cycles further: single-query 89,632,723 -> 89,081,698, multi-query 2,187,109,919 -> 2,152,039,894 (vs. original baseline 89,721,844 / 2,210,366,539 -- roughly -0.7% / -2.6% total).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hash_new_parent(Merkle parent hash — always exactly 64 bytes, two 32-byte nodes) forKeccak256/32-byte nodes to a single hand-rolledkeccak-f[1600]permutation, bypassingsha3's generic incremental-Digest/block_buffermachinery. Dispatch is aTypeId-based compile-time-constant check (D == Keccak256 && NUM_BYTES == 32); every other digest/size falls through unchanged to the original generic path.FieldElementPairBackend::hash_data(always exactly 2 small field elements, always sub-rate) the same way. Note: this backend is used prover-side only (FriLayerMerkleTreeBackend) — the guest-side cycle win below comes entirely from thehash_new_parentchange.FieldElementVectorBackend::hash_data(wide trace-row leaves): measured that a "try one block, else fall back" attempt is a net regression, since real rows always exceed the 136-byte keccak rate and the attempt is pure wasted overhead. Left on the generic path, with a comment explaining why..refs/merkle-keccak_handoff.md; tier 2 (routing the permutation through the VM's keccak precompile) is explicitly out of scope here — it needs its own outer-proving-cost measurement gate.Measurements (recursion verifier guest, cycle-accurate)
perf/rkyv-serialization)In-VM correctness verified at every step: the guest's committed 32-byte output digest is byte-identical to baseline across all intermediate versions of this change.
Review
Went through two rounds of adversarial review (performance, correctness, cryptographic soundness, implementation simplicity) against this exact diff. First round surfaced real issues (parent-hash fast path was doing a redundant buffer copy and wasn't being inlined across the crate boundary; test coverage only pinned the inner permutation helper, not the dispatch condition itself) — all fixed and re-measured. Second round came back clean: no correctness or soundness defects, byte-for-byte lane/padding arithmetic independently verified against the Keccak spec, prover/verifier path-selection symmetry confirmed, fail-safe
TypeId-mismatch fallback re-verified.Test plan
cargo test --workspace --exclude math-cuda(math-cuda needs a GPU, not available here)make test-ethrexcargo test -p crypto -p starksha3::Keccak256(1000 random 64-byte pairs, all sub-rate lengths 0..135), and dispatch-level pinning tests comparing the trait-method output (not just the inner helper) against an independent referencecargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture— in-VM verify accepted, output digest unchanged from baselinemake test-profile-recursion-single/-multi— cycle counts above